policy uncertainty
On The Presence of Double-Descent in Deep Reinforcement Learning
Veselý, Viktor, Todorov, Aleksandar, Sabatelli, Matthia
The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL). We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework. We rely on an information-theoretic metric, Policy Entropy, to measure policy uncertainty throughout training. Preliminary results show a clear epoch-wise DD curve; the policy's entrance into the second descent region correlates with a sustained, significant reduction in Policy Entropy. This entropic decay suggests that over-parameterization acts as an implicit regularizer, guiding the policy towards robust, flatter minima in the loss landscape. These findings establish DD as a factor in DRL and provide an information-based mechanism for designing agents that are more general, transferable, and robust.
Geopolitics, Geoeconomics and Risk:A Machine Learning Approach
Ortiz, Alvaro, Rodrigo, Tomasa
We introduce a novel high-frequency daily panel dataset of both markets and news-based indicators -- including Geopolitical Risk, Economic Policy Uncertainty, Trade Policy Uncertainty, and Political Sentiment -- for 42 countries across both emerging and developed markets. Using this dataset, we study how sentiment dynamics shape sovereign risk, measured by Credit Default Swap (CDS) spreads, and evaluate their forecasting value relative to traditional drivers such as global monetary policy and market volatility. Our horse-race analysis of forecasting models demonstrates that incorporating news-based indicators significantly enhances predictive accuracy and enriches the analysis, with non-linear machine learning methods -- particularly Random Forests -- delivering the largest gains. Our analysis reveals that while global financial variables remain the dominant drivers of sovereign risk, geopolitical risk and economic policy uncertainty also play a meaningful role. Crucially, their effects are amplified through non-linear interactions with global financial conditions. Finally, we document pronounced regional heterogeneity, as certain asset classes and emerging markets exhibit heightened sensitivity to shocks in policy rates, global financial volatility, and geopolitical risk.
- Asia > Russia (0.29)
- Europe > Ukraine (0.15)
- North America > Mexico (0.14)
- (45 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Foreign Policy (1.00)
- Government > Commerce (1.00)
- (2 more...)
Economic Policy Uncertainty: A Review on Applications and Measurement Methods with Focus on Text Mining Methods
Kaveh-Yazdy, Fatemeh, Zarifzadeh, Sajjad
Economic Policy Uncertainty (EPU) represents the uncertainty realized by the investors during economic policy alterations. EPU is a critical indicator in economic studies to predict future investments, the unemployment rate, and recessions. EPU values can be estimated based on financial parameters directly or implied uncertainty indirectly using the text mining methods. Although EPU is a well-studied topic within the economy, the methods utilized to measure it are understudied. In this article, we define the EPU briefly and review the methods used to measure the EPU, and survey the areas influenced by the changes in EPU level. We divide the EPU measurement methods into three major groups with respect to their input data. Examples of each group of methods are enlisted, and the pros and cons of the groups are discussed. Among the EPU measures, text mining-based ones are dominantly studied. These methods measure the realized uncertainty by taking into account the uncertainty represented in the news and publicly available sources of financial information. Finally, we survey the research areas that rely on measuring the EPU index with the hope that studying the impacts of uncertainty would attract further attention of researchers from various research fields. In addition, we propose a list of future research approaches focusing on measuring EPU using textual material.
- North America > Mexico (0.14)
- Asia > Macao (0.04)
- Asia > Japan (0.04)
- (25 more...)
PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection
Dass, Shivin, Pertsch, Karl, Zhang, Hejia, Lee, Youngwoon, Lim, Joseph J., Nikolaidis, Stefanos
Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower as each operator can control only a single robot at a time. To make this costly data collection process efficient and scalable, we propose Policy Assisted TeleOperation (PATO), a system which automates part of the demonstration collection process using a learned assistive policy. PATO autonomously executes repetitive behaviors in data collection and asks for human input only when it is uncertain about which subtask or behavior to execute. We conduct teleoperation user studies both with a real robot and a simulated robot fleet and demonstrate that our assisted teleoperation system reduces human operators' mental load while improving data collection efficiency. Further, it enables a single operator to control multiple robots in parallel, which is a first step towards scalable robotic data collection. For code and video results, see https://clvrai.com/pato
- North America > United States (0.14)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
How to Enable Uncertainty Estimation in Proximal Policy Optimization
Bykovets, Eugene, Metz, Yannick, El-Assady, Mennatallah, Keim, Daniel A., Buhmann, Joachim M.
While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization (PPO), and present possible applicable measures. In particular, we discuss the concepts of value and policy uncertainty. The second point is addressed by implementing different uncertainty estimation methods and comparing them across a number of environments. The OOD detection performance is evaluated via a custom evaluation benchmark of in-distribution (ID) and OOD states for various RL environments. We identify a trade-off between reward and OOD detection performance. To overcome this, we formulate a Pareto optimization problem in which we simultaneously optimize for reward and OOD detection performance. We show experimentally that the recently proposed method of Masksembles strikes a favourable balance among the survey methods, enabling high-quality uncertainty estimation and OOD detection while matching the performance of original RL agents.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- (2 more...)
Multiplicative Controller Fusion: A Hybrid Navigation Strategy For Deployment in Unknown Environments
Rana, Krishan, Dasagi, Vibhavari, Talbot, Ben, Milford, Michael, Sünderhauf, Niko
Learning-based approaches often outperform hand-coded algorithmic solutions for many problems in robotics. However, learning long-horizon tasks on real robot hardware can be intractable, and transferring a learned policy from simulation to reality is still extremely challenging. We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions as an algorithmic prior during training and deployment. During training, our gated fusion approach enables the prior to guide the initial stages of exploration, increasing sample-efficiency and enabling learning from sparse long-horizon reward signals. Importantly, the policy can learn to improve beyond the performance of the sub-optimal prior since the prior's influence is annealed gradually. During deployment, the policy's uncertainty provides a reliable strategy for transferring a simulation-trained policy to the real world by falling back to the prior controller in uncertain states. We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation and demonstrate safe transfer from simulation to the real world without any fine tuning. The code for this project is made publicly available at https://sites.google.com/view/mcf-nav/home.
- Oceania > Australia > Queensland > Brisbane (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference
Ortega, Pedro A., Braun, Daniel A.
Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Here we discuss three important features of this approach. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of game-theoretic analysis. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.